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ABSTRACT 


A multiplier circuit is one of the most important functional blocks of many 
nano-electronic, control and automation applications. In this work, an energy- 
efficient multiplier is reported based on a 3:2 compressor. The multiplier has 
been designed in three different parts. In the first part, a partial product (PP) 
generator is used. In the second part, the partial products are reduced which is 


termed as PPP (partial product processing). Whereas in the third step final 
addition is performed. PPs are produced by using AND gates. The PPP is 
Keywords: designed in two-phase. In the first phase, the Wallace tree logarithm has been 
used to reduce the PPs. Whereas, in the second phase the PPs are reduced by 


sp ji using energy-efficient half adder and 3:2 compressor. At last, in the third step, 
A tip ee by using a carry-save adder final addition has been computed. The 
Partial Products performance analysis of the designed multiplier is evaluated and compared 
PDP with other multiplier circuits. The multiplier shows performance 
PPP improvements by 20.55%-46% for the power supply variation from 1.2 V to 
Wallace 0.6 V. All the simulations and analyses have been carried out by using the 
Synopsys EDA tool. 
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1. INTRODUCTION 

A multiplier is an essential block of many nano-electronic applications, control and automation 
applications [1]. When there requires multiplication, a multiplier module (circuit) is required [2]. Thus, it has 
applications in computer vision, computer-aided design, image processing, DSP processors, MAC, 
communications systems, filters, IoT applications, VLSI circuits and systems, [3]. So, the speed of operation, 
power consumption and complexity of these applications depends on the core multiplier modules up to some 
extent [4, 5]. In the field of nano-electronic research, it is one of the trending research objectives to design low 
power and high-speed multiplier circuits in nanotechnology for VLSI applications [6]. Low power and high- 
speed multipliers have designed by adopting different approaches and techniques [7]. Each design has own 
advantages and disadvantages. In the literature, the architectures and techniques are observed in [8-16]. 
Different approaches have adopted by either algorithm or new architecture. It has been studied these designs 
have drawbacks with respect to each other. Three major issues that are observed as high power consumption, 
low speed of operation and complexity of architecture. Thus, in this work, and energy-efficient Multiplier has 
been designed by using an algorithm and espousing a new structural module. 

The multiplier reported here is of 4x4. It is designed in three steps. In the first step, partial product is 
generated by computing two 4 bit numbers, in the second step the reduction of the partial products and in the 
final step fast addition to yield the find products of 4x4 multiplier. In the 1st step AND gates have been used 
to get the partial products, where partial products are generated by multiplying each bit of a number by each 
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bit of another number. The PP is reduced by using a Wallace tree algorithm. It is because the Wallace tree 
algorithm is one of the oldest and superior techniques that have been used to reduce the multiplication 
complexity [8]. Another reason for widespread popularity is due to the speed of operation [9]. The final step is 
the fast addition which is done by a carry-save adder (CSA). The main reason to use a carry-save adder is that 
it is one of the fast adders. The performance of the designed multiplier is compared with two existing Wallace 
multipliers reported in [5] and [11]. Since for PPP, the Wallace tree algorithm has been used, so the designed 
multiplier is also known as Wallace multiplier In the rest of the paper, the word multiplier or Wallace multiplier 
will indicate the same meaning unless quoted. The rest of the structure of this manuscript given as Section 2 
explains the detail of the designed multiplier, Section 3 is used to discuss the performance analysis and 
discussion, whereas the final Section 4 concludes this article followed by references. 


2. METHODOLOGY 

In any MxN multiplication, the first step is the partial product generations which are the results of the 
products of two numbers M and N. Then the partial product is added up to get the final result. In Wallace 
multiplier, the process is the same but done in three steps [8, 11, 12]. Initially, the partial products (PP) are 
produced by multiplying two inputs which are followed by partial product processing (PPP). At last, final 
addition (FA) is performed by a fast adder [13]. In the PPP, the PPs are categorised in stages by distributing 
the partial products by using the Wallace tree algorithm. Each stage contains a number of rows which are 
calculated by as shown in (1) [5], where i" is the stage and S; gives the number of rows. 


S(i+ 1) = 2(Si/3) + Si mod 3 (1) 


In this work, the multiplier has been designed in three steps as mentioned above, where partial 
products (PP) are generated by using the inputs, then PP processing (PPP) and finally final addition (FA) by a 
fast adder. For PP generations AND gates are used. The PPP is designed in two phases. In the first phase, the 
Wallace tree logarithm has been used to reduce the PPs, whereas, in the second phase the PPs are added by 
using energy-efficient half adder and 3:2 compressor. Since the proposed multiplier is 4x4 bits, two 4-bits 
inputs will give 16 numbers of partial products which are reduced by PPP. And the final addition is done once 
the reduction of partial products is successfully achieved. The details descriptions of three designing steps are 
given. 


2.1. Ist step: 

The Ist step is a partial product generation. The partial products are generated by using AND gates. 
The AND gate used here is of CMOS type, it is because it outperforms other types of AND gates [17]. Since 
the proposed multiplier is of 4x4 bits, so there will be 16 partial products and thus 16 CMOS AND gates are 
required to produce the same. Let the two numbers be input! =a3a2ala0 and input2=b3b2b1b0. So, the partial 
products will be pOO=a0b0, pO1=b0al, pO2=b0a2, pO3=b0a3, p10=b1a0, pll=blal, p12=bla2, p13=bla3, 
p20=b2a0, p21=b2al; p22=b2a2, p23=b2a3, p30=b3a0, p31=b3al, p32=b3a2 and p33=b3a3. The same has 
been shown in Figure 1. Similarly, to design an NxN multiplier, there will be N2 partial products thus required 
the same number of AND gates. 


a3a2ala0 
b3b2b1b0 
p03p02p01 p00 
pl3p1l2p11p10 
p23p22p21p20 
p33p32p31p30 


Figure 1. All the partial products of two numbers input! and input2 


2.2. 2nd Step: 

The 2nd step is partial product reduction (PPP). Here, the partial products are reduced by using the 
Wallace algorithm, where the PPs are rearranged in a fashion of a tree-like parallel structure. It is achieved by 
categorised the above rows in stages by the Wallace tree algorithm. Until the last stage contains two rows of 
partial products, the algorithm has been repeated. As shown in (1) has been used for this purpose. A flow chart 
for the same has been shown in Figure 2. 
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Figure 2. Flowchart for Wallace multiplier 


For more illustration how PPP (partial product processing) has been done is shown in Figure 3. It is 
seen that in 1st stage it contains 4 rows, 2nd stage contains 3 rows but the 3rd stage contains only two rows of 
PPs. Thus, the algorithm will stop at the 3rd stage 1.e., it will be the final stage. The partial products of each 
stage have been computed by using half adders and 3:2 compressor except the final stage. The reason behind 
using compressors is they are energy efficient for multiplications [17-22]. If there are two rows of PP half adder 
(HA) is used, whereas for 3 rows PP 3:2 compressor is used. The compressor can reduce three rows of partial 


products into two. The HA used here is reported in [18], whereas, the 3:2 compressor is designed by using the 
architecture mentioned in [17] and [23]. 
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Figure 3. PPP by wallace tree algorithm (black dots indicate PP) 
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2.3. 3"4 step: 


The third step is for fast addition. In the PPP, when any stage contains only two rows, the Wallace 
algorithm stops there, i.e., there won’t be any further stage(s). As shown in Figure 3, stage 3 contains only two 
rows of partial products, thus it will be the final stage. So, an addition is required to process these two rows of 
partial products. Since, it the last stage, so fast addition is required to improve the performance of the multiplier 
that will ensure less delay. Thus, fast adders are used in this step. The fast adder used here is a 4-bit carry-save 
adder (CSA), as CSA is one of the fastest adders [9]. 
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3. RESULTS AND DISCUSSION 

The designed Wallace multiplier is the combination of three steps discussed above. The performances 
are evaluated by simulating the multiplier by using the Synopsis EDA tool at room temperature. The technology 
node used here is 90 nm CMOS pdk technology. The multiplier is the combination of step1, step2 and step3 
mentioned above. The results have been shown in Table 1. The parameters such as power, delay, power-delay 
(PDP) and energy-delay product (EDP) have been calculated. The performance than compared with 
conventional [5] and Hussain [11]. Itis observed that though the conventional Wallace multiplier has the lowest 
power consumption, the proposed design has the best delay, PDP and EDP. The delay has been minimised by 
using the 3:2 compressor and CSA. Since it has the lowest delay and moderate power consumption than the 
other Wallace multiplier taken for consideration, it has the best PDP and EDP. Thus it could be commented 
that the designed multiplier is the best energy efficient multiplier as compared to conventional [5] and Hussain 
[11]. The effects of delay, power, PDP and EDP are also observed by varying the power supply. 


Table 1. Simulation results with 90nm CMOS technology 


Parameters Conventional [6] Hussain [12] Proposed 


Power (uW) 6.75 7.12 7.49 
Delay (ns) 25 20.7 17.9 
PDP (fJ) 168.75 147.38 134.07 
EDP (zJs) 4.22 3.05 2.4 


To comment on the performance of the multiplier, it is simulated at 32 nm CMOS technology to 
evaluate the performance parameters. The results have been shown in Table 2. It is found that the designed 
multiplier has better performances as compared to the other two multipliers. Though the conventional Wallace 
multiplier consumes least power, but the proposed has best delay, PDP and EDP as compared to [5] and [11]. 
From the results, it is clear that the designed multiplier has the best EDP 1.e. it is energy-efficient. It is achieved 
by using the algorithm and the structural optimisation in the design. The structural optimisation has been 
achieved by using the proper circuit module. The low power compressor and HA that is used in PPP with CSA 
which is used in the Final addition speed up the operation. To establish the validate of findings and significance 
of the results, power, delay, PDP and EDP analysis has been carried out against power supply. The same has 
been discussed below. 


Table 2. Simulation results with 32nm CMOS technology 


Parameters Conventional [5] Hussain [11] Proposed 


Power (nW) 435.05 499.25 510.11 
Delay (ns) 21 18.47 16.97 
PDP (fJ) 9.136 9.221 8.657 
EDP (zJs) 0.192 0.170 0.146 


3.1. Power analysis against power supply 

Power is one of the most important considerable parameters of any digital circuits and systems. Hence, 
the total power has been calculated and its effect has been also observed by varying the supply voltage from 
0.6 V to 1.2 V. The lowest power supply considered here is 0.6 V because of the minimum threshold voltage 
required for 90nm CMOS technology. Whereas, 1.2 V is the highest voltage level considered here as per the 
ITRS roadmap. The total power consumption reported here is the summation of static (when the circuit is on 
steady-state) and dynamic (when the circuit is in transition state). The results have been shown in Table 3. A 
comparison graph by varying the input power supply is shown in Figure 4. 


Table 3. Power (uW) analysis against power supply (V) 
Power supply (V) Conventional [5] Hussain [11] Proposed 


E2 6.75 T12 7.49 

1 5.69 5.94 6.05 
0.8 5.03 4.14 4.65 
0.6 4.684 4.1 3.99 
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Figure 4. Power (uW) Vs variation of power supply (V) 


3.2. Delay analysis against power supply: 

Speed is another most important performance evaluating parameters of modern nano-electronic 
applications. So, the delay has been calculated and its effects against power supply are also observed. The delay 
is determined when the input reached one-half of the power supply voltage level (50% Vdd) and the latest 
output signal reached the same voltage level. Thus the worst-case delay has been recorded. The delay is 
calculated at 0.6 V, 0.8 V, 1 V and 1.2 V respectively and the same has been listed in Table 4. The effects of 
delay with input power supply variations are shown in Figure 5. It is observed that the performance of delay is 
best in the case of the designed multiplier because of architecture. The compressor used in the partial product 
processing unit reduced the data flow path and eventually speed up the operation. Another reason is the use of 
a carry-save adder at the final step, where operation starts without the carry bit (which is an inherited property 
of CSA), thus reduce the delay. 


Table 4. Delay (ns) analysis against power supply 
Power Supply (V) Conventional [5] Hussain [11] Proposed 


1.2 25 20.7 17.9 
1 Ahd 22 20.1 
0.8 32.8 25.5 21.2 
0.6 38.41 28.72 24.3 
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Figure 5. Delay (ns) Vs variation of power supply (V) 


3.3. PDP and EDP analysis against power supply: 

In digital circuits and systems for nano-electronic applications delay and power can’t define the 
performance of the circuits. Thus power-delay product is calculated which is also known as a Figure of merit 
that indicates the energy efficiency of the circuits or systems. On the other hand, low PDP circuits may also 
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perform slowly, so energy-delay product (EDP) is another metric that is used to evaluate the performance. 
Thus, PDP and EDP are calculated. By varying the voltage supply from 0.6V to 1.2V the same has been also 
evaluated. The PDP and EDP analysis against power supply are shown in Tables 5 and 6 respectively. 
Comparison graphs for PDP and EDP against power supply are shown in Figures 6 and 7 respectively. It is 
seen that the proposed multiplier dominates in both cases. So, it could be commented that the designed 
multiplier is energy-efficient for VLSI circuits and systems. 


Table 5. PDP (fJ) analysis against power supply 
Power Supply (V) Conventional [5] Hussain [11] Proposed 


1.2 168.75 147.38 134.07 
1 157.61 125.93 121.61 
0.8 164.98 119.92 98.58 
0.6 179.91 117.75 96.96 


Table 6. EDP (zJs) analysis against power supply 
Power Supply (V) Conventional [5] Hussain [11] Proposed 


1.2 4.22 3.05 2.4 

1 4.37 2.67 2.44 
0.8 5.41 3.03 2.09 
0.6 6.91 3.38 2.36 
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Figure 6. PDP (fJ) Vs variation of power supply — Figure 7. EDP (zJs) Vs variation of power supply (V) 
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4. CONCLUSION 

In this article, a4x4 multiplier is reported which is designed in three steps for nano-electronics, control 
and automation applications applications. The Ist step is the designing of partial product (PP) generation by using 
AND gates. The 2nd step is PP processing (PPP). The PPP is designed in two phases. In the first phase, the 
Wallace tree logarithm has been used to reduce the PPs, whereas, in the second phase the partial products are 
computed by using energy-efficient half adder and 3:2 compressor. If there are two rows of PP half adder is 
used, whereas for 3 rows PP 3:2 compressor. In the 3rd step, the final addition has been done by using a carry- 
save adder (CSA) which fastens the overall operation. The Multiplier has been simulated by using Synopsys 
tool with 90nm CMOS pdk technology. The performance metrics such as power, delay, PDP and EDP of the 
multiplier are computed and compared with the other two multipliers. The effects of power, delay, PDP and 
EDP are also observed by varying the input power supply. It is witnessed, the proposed Wallace multiplier has 
best performances as compared to other multipliers taken for consideration in terms of delay, EDP and PDP. 
Thus, the multiplier is energy-efficient and could be a possible alternative for future nano-electronics, control 
and automation applications. 
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